Journal: Scientific Reports
Article Title: Label-free flow cytometry of rare circulating tumor cell clusters in whole blood
doi: 10.1038/s41598-022-14003-5
Figure Lengend Snippet: Machine learning model workflow. ( a ) Collected scattering and fluorescence data were analyzed to find the location of all cluster events. FP1 represents GFP used for ground truth labeling. ( b ) Data were initially normalized using power measurements and a second order Butterworth filter. ( c ) Data from FP1 was processed separately from the cumulative scattering data (405 + 488 + 633). The built in findpeaks.m function was used to find all local maximums in the 1.5-min data traces in both FP1 only and cumulative scattering data sets. ( d ) An intensity threshold was used to define the start and end of a peak. The threshold value was defined as being three times the standard deviation of the entire 1.5-min data trace in the FP1 and cumulative scattering channel. ( e ) Peak locations and characteristics were recorded for both the ground truth (FP1) data and the cumulative scattering data. ( f ) Using the locations of these clusters, a window of ± 13 points per scattering channel were reorganized into an 81-point feature vector. Based on FP1, we generated the labels for peaks as either being CTCC and NC events. ( g ) The generated features and labels were used to train a Gentle Adaptive Boost, Ensemble Boosted Tree classification algorithm to classify peaks. The training set included measurements from 10 days of collections while the test set was composed of 5 separate days of data. The final model was an ensemble of 50 models trained on fifty different data sets composed of the same CTCC peaks and an equal number of randomly selected NC peaks. ( h ) The test set was evaluated following training and used to classify peaks based on similarly formatted feature vectors (pseudocode can be found as Supplementary Fig. S1). ( i ) Performance metrics were calculated based on test set performance.
Article Snippet: The cumulative signal of each 1.5 min segment was analyzed using the built in MATLAB function findpeaks.m in the Signal Processing Toolbox to identify peaks.
Techniques: Fluorescence, Labeling, Standard Deviation, Plasmid Preparation, Generated